AD-A070  024  CALIFORNIA  UNIV  BERKELEY  STATISTICAL  LAB  F/G  12/1 

clustering:  reminiscences  of  some  episodes  in  my  RESEARCH  ACTIV— ETC(U) 
1979  J NEYMAN  N00014-75-C-0159 


UNCLASSIFIED 


CU-SL-79-03-ONR 


r1 


DISTRIBUTION  STATEMENT  A 


Approved  for  public  release; 
Distribution  Unlimited 


UNCLASSIFIED 

Srcuniy  Clmifiotion  _ 

DOCUMENT  CONTROL  DATA  ■ R & D 

(Set  urtty  r I mi \l  I tc  ml  ion  ol  title,  body  ol  ahnlratl  and  indeatng  annotation  n ant  hr  **nf*-f#*«/  when  the  overall  repat!  Is  t lu  » » t tied* 

""" ™ Tr^7jt4rT^TT^T±t^^A^*ot4 

Statistical  Laboratory  Unclassified 

University  of  California  TfTTZoZ^ 

Berkeley,  California  91720 


Clustering:  reminiscences  of  some  episodes  in  my  research  activity. 


4 Dl  SCRif’ivj  NOTES  ( Type  ol  report  t»nd  me  I tin  ive  dmtr  % ) 

s ctv^jt4  r it 

» Atj  T mq  W.fti  (first  nion.  middle  mltjfj^  last  name) 

Jerzy/ Jeymn"  ^ ~ 


& r ■ J 

■ --1 


ONR/N00«5j.U-75-C-0159, 


y».  total  no  or  packs 


'ra.  OFMClNA  ION’S  RfPCPT  NUMBE  HIM 

OMR  79-03 ' 


KMAC  r-/-v6-6-W^) 


9 h.  OTHER  REPORT  NOISI  (Any  other  number*  that  may  be  assigned 


ytHT+sreport) 

Mcu- 


SL-7'/-4/1-OJUR 


1C  DISTRIBUTION  STATEMCnT 


This  document  has  been  approved  for  public  release;  its  distribution  is 
uni imited. 


12  SPONSORING  MILITARY  ACTIVITY 

Office  of  Naval  Research 
Washington,  D.C.  200ll 


* ABSTRACT 


The  ideas  of  a stochastic  process  of  clustering  came  to  idle  author's  attention 
from  Dr.  Geoffrey  Beall,  an  entomologist  interested  in  the  distribution  of  larvae 
over  an  experimental  field.  Lprvae  are  born  from  eggs  deposited  by  moths,  not 
singly,  but  in  J*egg-masses.w^/Vfter  hatching,  larvae  begin  to  crawl  in  search  of 
food.  Later,  a “"general  eensud*“bf  larvae  is  performed.  The  r.v.  of  interest 
X = no.  of  larvae  counted  in  a unit  area  plot  in  the  field.  Conceptual  elements: 
cluster  centers  (=  egg-masses) , cluster  size  (=  no.  of  larvae  from  a single  egg- 
mass),  dispersal  of  cluster  members.  Over  the  four  decades  since  the  publication 
of  the  theory  relating  to  larvae,  essentially  the  same  mechanism  of  clustering 
was  found  to  underly  many  diverse  natural  phenomena:  clustering  of  galaxies, 
population  dynamics,  epidemics  and  effects  of  irradiation  of  living  cells. 


inc  las 


CLUSTERING: 

PI  MINI  SCI  NCI  S 01  SOME  I 1'ISOOI  s 
IN  MY  R[ SI  ARCH  ACTIVITY* 

Jerzy  Neyman 
Statistical  Laboratory 
University  of  California,  Berkeley  04 720 


1.  Introduction.  It  is  a pleasure  to  he  able  to  deliver 
this  Second  Pfizer  Annual  Colloquium.  In  selectinq  its  subject 
I thought,  of  work  in  our  Berkeley  Stat.  Lab.  relating  to  pharma- 
cology. However,  as  of  now,  this  work  is  not  sufficiently  ad- 
vanced to  be  reported  on  this  important  occasion. 

The  subject  I selected  covers  a long  series  of  intercon- 
nected studies  in  several  substantive  domains,  all  of  them  re- 
flecting the  inspiration  I received  in  the  late  1930's  from 
Or.  Geoffrey  Beall,  then  of  the  Dominion  Entomological  exper- 
imental Station,  Chatham,  Ontario.  Regretfully,  I have  no 
references  to  Dr.  Beall's  publications. 

It  happened  that.,  while  Dr.  Beall's  preoccupation  was  with 
a special  kind  of  entomological  experiment,  the  idea  for  which 
I feel  indebted  to  him,  that  of  the  phenomenon  of  clustering, 
proved  to  be  very  reinvent  in  the  following  diverse  domains: 

(i)  in  the  study  of  spatial  distributions  of  galaxies,  (ii)  in 


♦Approximate  text  of  the  presentation  at  The  Pfizer  Colloquium  at  the 
Department  of  Statistics,  The  University  of  Connecticut,  April  30,  1 °70 . 


-2- 


1 


f 


population  dynamics,  (iii)  in  tt''  theory  of  epidemics,  and 
(iv)  in  the  study  of  radiation  carcinogenesis. 

While  this  presentation  is  intended  to  reflect  my 
personal  experiences,  it  may  also  be  considered  as  a contribution 
to  the  history  of  a concept,  the  concept  of  "clustering."  This 
historical  sketch  covers  four  decades.  The  unavoidable  conse- 
quence is  that  most  of  the  developments  described  are  "symbolized" 
rather  than  studied  in  depth.  Still,  my  hope  is  that  the  evolution 
of  the  original  simple  concept  will  be  found  intelligible. 

The  plan  of  the  present  paper  is  as  follows.  Section  2 
outlines  the  original  problem  of  Dr.  Beall,  concerned  with  counts 
of  larvae  in  plots  of  an  experimental  field.  The  corresponding 
stochastic  process  may  be  labeled  that  of  a "single  clustering." 

Section  3 is  also  concerned  with  the  "sinole  clustering" 
process.  However,  the  substantive  domain  is  very  different: 
distribution  of  galaxies  in  space. 

The  subjects  of  all  the  subseouent  sections  are  concerned 
with  sequences  of  consecutive  clusterings,  that  is,  of  the  process 
of  clustering  of  clusters.  The  natural  phenomena  studied  include 
population  dynamics  (Section  4),  radiation  carcinogenesis  (Section 
5),  and  theory  of  epidemics,  first  "outdated"  (Section  6)  and 
later  "modernized"  (Section  7).  Section  R:  Concluding  remarks. 

2.  Single  Clustering:  Counts  of  Larvae  in  Plots  of  an 
Experimental  Field.  Consider  a large,  reasonably  uniform,  ex- 


W 


5 

k 

M 


1 


► 


.j, 


-3- 


perimental  field  divided  into  a number  of  unit  area  plots. 
Consider  one  of  these  plots,  which  it  will  be  convenient  to 
describe  as  "target."  At  a particular  period  durinq  the 
summer,  moths  are  flyinq  over  the  field  and,  from  time  to  time, 
deposit  their  eggs.  These  eggs  are  not  deposited  singly  but 
in  "masses,"  each  composed  of  a large  number  of  eggs.  In  due 
course,  these  eggs  produce  larvae  which  begin  to  crawl  in  search 
of  food.  After  a certain  period  of  time,  when  larvae  are  some- 
what larger  and  convenient  to  count,  a general  census  is  per- 
formed and  our  interest  is  concerned  with  the  number,  say  X, 
of  larvae  counted  in  the  target  plot. 

The  concept  of  clustering  is  connected  with  the  fact  that 
the  larvae  cannot  travel  fast.  The  possibility  of  one  of  them 


being  found  in  the  target  plot  depends  on  the  distance  of  the 
egg-mass  ( = "cluster  center")  from  which  the  given  larva  emerged. 
Thus,  the  conceptual  counterpart  of  the  empirical  phenomenon 
relating  to  a single  "target"  plot  must  involve  a bigger  area, 
the  "area  of  accessibility"  surrounding  the  target. 

The  whole  mechanism  of  clustering  involves,  then,  the 
following  concepts:  (i)  the  distribution  of  eog-masses  ( "cluster 
centers")  over  the  field,  (ii)  the  number  of  larvae  from  a single 
mass  surviving  up  to  the  census  ( number  of  cluster  members,  or 
"size”  of  the  cluster),  (iii)  the  mechanism  of  "dispersal"  of 
cluster  members  and  the  implied  "area  of  accessibility." 


-4- 


Naturally,  the  details  of  these  three  conceptual  elements 
must  vary  from  one  empirical  domain  of  study  to  the  next. 
Figures  land  2\both  representing  the  details  of  the  original 
publication  of  193°,  are  intended  to  "symbolize"  the  contem- 
porary thinking.  It  will  be  seen  that  at  the  time  when  the 
paper  was  written,  in  1936  or  1937,  Dr.  Peall  and  1 did  not 
dream  that  the  mechanism  of  clustering  could  be  relevent  to 
the  understanding  of  phenomena  of  clustering  of  galaxies,  etc. 
The  then  contemplated  applications  were  "entomoloov"  and 
"bacteriology. " 


ON  A NEW  CLASS  OF  "CONTAGIOUS”  DISTRIBUTIONS,  APPLICABLE 
IN  ENTOMOLOGY  AND  BACTERIOLOGY 

By  J.  Nkyman 


CONTENTS 

PAUI 

t.  Introduction  35 

3.  Distribution  of  larvsr  in  ex |>*rr linen t«l  plots  3t> 

3.  Particular  classes  of  the  limiting  distribution  of  X 40 

4.  Certsm  general  properties  of  the  distributions  deduced  43 

5 Contagious  distribution  of  type  .4  depending  on  two  parameters  45 

fi.  Contagious  distribution  of  type  t depending  on  three  parameters  48 

7.  Contagious  distributions  of  types  H ami  f 53 

8.  Illustrative  examples  ami  concluding  remarks  54 

9 Heferences  57 


Figure  1.  A detail  of  the  title  page  of  the  original  paper  of  1939. 
Ann,  Math.  Stat.,  Vol . 10  (1939),  pp,  36-67. 


-5- 


TABLE  I 

Distribution  of  European  corn  borers  in 
120  groups  of  S hills  each,  ( data  pro- 
vided by  Dr.  Beall),  fitted  by  Poisson 
Law  and  by  type  A Law  with  two 
parameters 


Frequency 

No.  of  — - — 


borers 

Exp  P L. 

Ob- 

served 

Ex  p. 

T A 

0 

5.0 

24 

22.6 

1 

1G.0 

16 

16.7 

2 

25. 3 

16 

. 18.3 

3 

26.7 

18 

16.4 

4 

21.1 

15 

13.4 

5 

13.4 

9 

10.3 

6 

7.1 

6 

7.5 

7 

3.2 

5 

5.2 

8 

1 .3 

3 

3.5 

9 

.4 

4 

2.3 

10 

■1 

3 

1 .5 

11 

0 

12 

1 

Beyond 

2.3 

Ml 

— 

2. 178 

m 2 

— 

1 .454 

l\ * 

.000,000 

.95 

TABLE  II 

Distribution  of  yeast  cells  in  1,00  squares 
of  haemacytometer  observed  by  “Stu- 
dent” (1907),  fitted  by  Poisson  Imw 
and  by  type  A Law  with  two  param- 
eters 


No.  of 
cells 

Frequency 

Exp  P.  L 

Ob- 

served 

'Exp.  T.  A. 

0 

202 

i 213 

1 214.8 

1 

138 

128 

, 121.3 

2 

47 

37 

45.7 

3 ! 

ii 

18 

13.7 

4 

3 

3.6 

5 | 

1 

.8 

Beyond 

2 i 

— ! 

.1 

mi 

1 

3.605 

Till 

— | 

- j 

.189 

1 

> .02 

! 

> .1 

Figure  2.  Two  tables  published  at  the  end  of  the  original 
paper  of  1930. 


i 


-6- 


3.  Clustering  of  Galaxies.  As  is  generally  known,  Cal- 
ifornia is  the  "land  of  big  telescopes."  Having  lived  in 
Berkeley  since  August,  1938,  it  was  unavoidable  for  me  to  be- 
come exposed  to  statistical  problems  of  astronomy  and.  more 
specifically,  of  "extragalactic  astronomy"  or  cosmology.  In 
particular,  1 must  record  the  inspiring  influence  of  two  "red- 
blooded"  astronomers,  N.U.  Mavall  and  C.O.  Shane,  at  the  time 
both  at  the  Lick  Observatory,  of  which  Shane  was  the  director. 

The  principal  subject  of  our  studies  was  the  question  whether, 
by  and  large,  the  distribution  of  galaxies  in  space  is  clustered, 
or,  as  was  broadly  believed,  are  the  galaxies  distributed  in 
space  singly,  perhaps  approaching  a Poisson  process.  Begin- 
ning with  1952,  there  resulted  a substantial  sequence  of  pub- 
lications, frequently  co-authored  bv  astronomers.  These  are 
exemplified  by  the  following  references: 

J.  Neyman  and  E.L.  Scott,  "A  theory  of  spatial  dis- 
tribution of  galaxies,"  Astrophvsical  Journ.,  Vol . 

116  (1952).  pp.  144-163.  

J.  Nevman,  C.D.  Shane  and  E.L.  Scott,  "On  the  spatial 
distribution  of  galaxies:  a specific  model," 
Astrophysical  Journ.,  Vol.  117  (1953)  pp.  92-133. 

E.L.  Scott,  "The  brightest  galaxy  in  a cluster  as  a 
distance  indicator,"  Astronomical  Journ.,  Vol.  52 
(1957).  pp.  278-295. 

J.  Neyman,  "Sur  la  thdorie  probabiliste  des  amas  de 
galaxies  et  la  verification  de  l'hvpoth^se  de  I'ex- 
ansion  dp  1‘ uni  vers,"  Annales  de  1 * I ns ti tut  Henri 
Poincarg,  Vol.  14  (1955),  pp.  2(11-244. 

J.L.  Lovasich,  N.U.  Mayall,  J.  Neyman,  and  E.L.  Scott, 

"The  expansion  of  clusters  of  galaxies,"  Proc.  fourth 
Berkelelv  Symp . on  Math.  Stat.  and  Prob. , (X  Neyman, 
ed.y,  Ifm v." oTTalWTTress , Berkeley  and  Los  Angeles, 
Vol.  3 (1961),  pp.  187-227. 


1 


-7- 


It  will  be  realized  that,  compared  to  clustering  of  larvae 
in  a field,  the  cosmological  aspect  of  the  phenomenon  of  cluster- 
ing is  inmeasureably  more  complex.  One  reason  is  the  impossibility 
of  approaching  the  cluster  and  counting  its  members!  All  the 
astronomer  can  do  is  to  look  at  photographs  of  the  sky,  count 
on  them  the  images  of  galaxies,  studv  images  of  single  galaxies, 
and  also  the  spectra  of  the  light  they  emit.  The  ingenuity  the 
astronomers  exhibit  is  really  remarkable.  Also,  there  is  here 
a most  encouraging  east/west  intellectual  cooperation.  This 
I had  the  pleasure  of  describing  in  mv  latest  publication  dealing 
with  cosmology.  The  reference  is:  "Reminiscences  of  a Revo- 
lutionary Period  in  Cosmology,"  Problems  of  Physics  and  [ vo- 
lution of  the  Universe  (festschrift  for  V.A.  Ambartsumian), 

L.A.  Mirzovan,  ed. . Pub  1 . House  of  the  Armenian  Acad,  of 
Sciences,  Yerevan  (ld/H),  pp.  243-249. 

4.  Population  Dynamics:  Sequence  of  Clustering  of 
Clusters,  of  Clusters,  etc.  Here  the  most  relevant  publication, 
co-authored  with  f.l.  Scott,  has  the  title:  "On  a mathematical 
theory  of  populations  conceived  as  conglomeration  of  clusters." 

It  appeared  in  1%7  in  Vol . YXU  of  Proc.  Cold  Sorinq  Harbor 
Symposia  on  Quantitative  Hiologv,  pp.  1P9-120. 

The  problem  studied  can  be  summarized  as  follows.  Con- 
sider an  infinite  plane  H representing  the  "habitat"  and  let 


R, , P-. Pr  be  any  arbitrarily  selected  non-overlapping 

I c S 


-8- 


regions  in  H.  Also,  let  nip  m^,  ...  , m$  be  s arbitary  non- 
negative integer  numbers.  The  characterization  of  the  dis- 
tribution of  a population  inhabiting  H is  understood  to  mean 
a rule  for  determining  the  probability  that,  simultaneously, 
the  numbers  of  the  population  members  in  R-j  will  be  exactly 
m-| , that  the  number  of  them  located  in  R2  will  be  exactly  m^, 
etc. 

In  other  words,  if  stands  for  the  number  of  population 
members  in  Rp  the  problem  was  to  deduce  the  formula  for  the 
probability  generating  function  of  the  random  variables  Xp 
X2 , . . . , X^ . 

The  mathematical  assumptions  used  in  the  work  are  reducible 
to  a repetition  of  the  process  of  single  clustering.  A litter, 
having  one  or  more  members,  is  born  at  a point  (=  cluster  center) 
in  H.  The  members  of  the  litter  (cluster)  disperse  and  gradually 
die  out.  Before  dying,  some  of  the  litter  members  produce  their 
own  litters  of  progeny,  etc.  The  particular  object  of  study  is 
the  ioint  distribution  of  two  successive  generations  of  the 
population,  the  paternal  and  the  filial.  Also,  some  asymptotic 
results  are  obtained. 

While  the  distribution  of  a species  over  the  habitat 
appears  as  a domain  very  different  from  that  of  the  distribution 
of  galaxies  in  space,  there  are  some  important  analogies.  In 
either  case  no  definitive  empirical  verification  of  the  hypo- 


-9- 


thetical  details  is  possible.  All  one  can  do  is  to  perform  a 
Monte  Carlo  simulation  and  to  compare  the  results  with  such 
fragmentary  observations  as  may  be  possible  to  accumulate. 

5.  Radiation  Carcinogenesis.  Out  of  the  problems  we  stud- 
ied in  our  Berkeley  Stat.  Lab.,  the  chance  mechanisms  qrovern- 
ign  carcinogenesis,  particularly  the  radiation  carcinogenesis, 
may  well  be  the  most  difficult  and,  perhaps, the  most  important. 

I try  to  describe  it  ahead  of  epidemics  for  the  reason  that,  at 
the  present  moment  there  is  something  like  a "lull"  in  our 
efforts . 

Obviously,  in  order  to  achieve  some  significant  results 
leadinq  to  the  understanding  of  the  chance  mechanism,  or  mechan- 
isms, in  living  cells  or  tissues,  a statistician  must  depend  upon 
interested  cooperation  of  an  experimenting  biologist.  It  happens 
that,  at  this  moment,  we  lack  the  necessary  contacts.  On  the 
other  hand,  as  will  be  described  in  Sections  6 and  7,  our  studies 
of  the  mechanisms  of  epidemics  develop  at  a reasonable  rate. 

Because  the  phenomenon  of  radiation  carcinogenesis 
appears  very  distant  from  that  of  the  distribution  of  larvae  and, 
certainly,  from  cosmology,  one  is  likely  to  believe  that  the  un- 
derlying chance  mechanisms  could  have  nothing  in  common.  Yet, 
closer  examination  indicates  the  contrary.  Such  differences  as 
exist  are  differences  of  complexity. 

The  problems  studied  and  the  results  obtained  are  summarized 
in  a relatively  recent  paper  written  jointly  with  Prem  S.  Puri. 


f'i 


‘ 


S 

uj 


Li 


f-'i 


-10- 


This  paper  being  just  a summary  report,  the  best  way  to  "symbolize" 
briefly  the  essence  of  our  efforts  seems  to  be  throuqh  extensive 
quotes  from  our  paper,  including  pictorial  illustration. 

Figure  3 reproduces  the  title  paae  of  our  article.  The 
article  is  concerned  with  the  chance  mechanism  of  damaqe  to  livinq 
cells  caused  by  irradiation.  One  aspect  of  the  "damaqe"  may  be 
the  cell  becoming  cancerous.  In  parallel  with  the  "damaqe"  we 
consider  the  mechanism  of  possible  "repair."  Fiqures  4 and  6 
illustrate  the  observable  phenomena  that  our  "structural  model" 
is  intended  to  explain.  These  phenomena  include  the  difference 
between  the  so-called  "hiqh"  and  "low"  linear  energy  transfers  (LET. ) 

The  reader  will  notice  that  the  role  of  "egg-masses"  in  Dr. 
Beall's  studies  is  now  played  by  "primary"  particles  of  irradiation. 
Experiments  are  possible  to  estimate  their  temporal  distribution. 

This  contrasts  with  the  practical  impossibility  of  countinq  the 
eqq-masses.  On  the  other  hand,  while  in  Dr.  Beall's  situation 
basic  observables  are  counts  of  larvae  in  the  test  plots,  in  the 
problem  of  carcinoqenesis  the  corresponding  entities,  namely  the 
"secondary  particles"  and  their  "hits"  in  the  tarqets,  are  im- 
possible to  count.  The  exception  is  the  possibility  of  concluding 
that  the  number  of  "unrepaired"  hits  is  zero,  etc. 

One  easily  illustrated  common  element  is  the  "area"  (or 
"volume")  of  accessibility. 


V 


-11 

« (•'  ••«»#  ' «'  «• 

. \ . \ *.(  Vi  I’VI 

\ m |.|<  U<|  IXvl  i S IV  l< 

MalnU.  i 


A structural  model  of  radiation  effects  in  living  cells 

Ittifihiimm  ol  rltalrnt(|  li**s  linrai  rungs  timilri  hi|h  Imnt  n>n||»  tianslei  tlmr  ifi|ntinr  dots  tale) 

IlH.’l  \l  \M\N*  \NI>  I'lll  KlS  1*1  •»»! ♦ 

' MiIiMw  I «UmiIi>u  tnUfoiti  i>l  t aliltHnli  IWdrWt  laid  W4  ' ,V  i«il  1 IV|*Hiitfnl  nt  Nliiitlht  I'uoIin  t mu  ml)  W cal  l a I • * ll  # It.. I.  • n j « 

t •#•*!» %)%th  «i  f*v  /*dy  !•  N *6 


\HMKM  I IVr  (huat  inrt  Sanmii  nl  if  II  (Uma|r  anJ  ul 
• rpiu  in  the  tiunse  ol  m ailialnm  imoKfi  l»o  details  familial 
to  hitdogtsls  (Sat  thin  (at  ifflU  to  hair  hern  nsrilmtkrd  in 
nsalfsematn  al  tiealinriit  (Sir  ol  their  drlaili  i«  that,  grnrtalls. 
thr  i tass.gr  ol  a %in||lr  |*n»»»aii  itdialitui  paitnle  geneiatrs 
a iliiitn  id  tnoiidai  in  w hii  h i ait  |wi«duor  ' hilt  that  damage 
thr  lising  i r II  W it h high  linrai  nteigi  tianilei.  each  thislri 
i 'on  la  mi  in  i main  tn'iinlait  |*aitulc*  With  lot*  linrai  met  as 
tianiln,  thr  mnnWi  id  in ondane*  »*ri  ilustei  is  gritrialls 
« in  a II  I hr  «ei  i*nil  in  ei  h*i*krd  del  ad  ol  the  t ha  me  me  than  ism 
t»  • mu  nurd  **  ith  m hat  mat  he  t ailed  thr  lime  iralet  id  tadia 
lion  damage  and  ol  thr  stihtrijtienl  irpaii  the  grimatinn  i*t 
a i Imtr  i ol  inmidan  |»at  In  lr  % and  thr  |Ht««tldr  h I la  mini  so 
ia|iidh  that  lot  all  |ii  at  In  al  |iiii|*oir %.  the>  mat  U inniidrird 
at  in  imiiiig  mil. nils  (hi  the  othn  hand,  thr  uitnrijiiritl 
ihangei  in  thr  damaged  cells  ap|*rai  to  teipiiir  me  aim  aide 
ammuiti  nl  time  t lie  i nnslnii  tetl  stinhaslii  mode)  rmKtdies 
lhr*r  drlaih,  the  i hislrnng  id  set  ondai>  |taitu  In  and  thr  time 
it  air  dill n nn  r 1 hr  ir  suits  e splam  i nlain  ilrtaiU  ol  ohseitrd 
|ihriioitir  it  a 

We  use  tin  In im  wmh  I ••  r ii / ot  K.s^ntit  inmlrK  ol  a phr 
iiniiiriiini  to  designate  a i hance  mr*  liatimn  tlrlinnl  in  Irum 
ol  soon  hs  }*oth«As*  at  entities  Imrnj;  some  sju-tdicd  hs|*» 
l he  I n a I | n iritm  .i  mis  hainsm  the  o|«riation  id  ss  hn  h 1%  r% 
|n  led  to  iiiiinii  (lie  plsrnomrtion  studied  A ilo  hash*  moiicl 
a o»ntf|»l  akin  lo  Kuril  idea  ol  thr  pnitt  t|tal  |uohlrtn  ol 
math*  in  ill.  ,»l  statistics  \P  o i out  i 4 'tnl  \%  it  h the  luirl  trim 
»i.dri  in'**  lirijiu  idli  cmsmntcitd  in  statistii  al  lilrt atmr  I ho 
hi  ii  I In  in  n u*«  J to  designate  4 moie  01  lew  ounplicatrJ  lot 
H111I4  im  entril  to  Id  Oir  idssrt  s at  mm  *1  it  K ml  a 10  csusstdci  at  ion 
ol  the  tins  I14110111  I Init  might  h.nr  pnalut'esl  (linn  I ot  tho  kind 
ol  nttah  I 0111  |uHnicil  Inin  o • u r r * is  u.i  f.  u y p»»s  edtor 
t 'ndouho  dU  on  h I'liM'.liiin  air  lorlnl  and.  in  lari  thrs 
ap|*  4>  1 ilia  1 mdahlr  **  lien  an  el  lot  t o itiatlr  to  at  hod  t lar  drtaih 
ol  a dm  hadu  nniilrl  lo  III  thr  ol*so  1 alums 

I he  ultimate  goal  ol  the  pirsrnt  ilmh  o a stochastic  ttnalrl 
ol  phrnom*  114  drs<  loping  in  inadiatrd  r sj«ri tmndal  annnah 
Moneiri  thr  pirvnl  pajsri  o hmilrsl  lo  inadiation  rlfrvls  on 
,rlli  ol  some  hom«»grnrom  luoir  1 hr  lilri  alnir  on  tho  suhtr*  I 

0 ipulr  in  h I 01  rsamplr  see  t*so  irornt  |*a|«ris  hs  I’as  nr  and 
t.aiirll  t .*  A'  Hossrsri.  llirtr  *p)«r*i  tii  l»r  mans  |*nnts  ol 
sagumrss  into  rshngl*  dis»  uwsl  hs  Molr  (4) 

\ he  pl*n  ol  the  1*4)10  o as  t olios* s l list  or  out h nr  1 ntain 
liripifin  * findings  n-latrd  to  rs|*ri  tmrntal  animals  \t  trail 
smnr  ol  tin  *r  Intdtngs  mull  l*r  1 inhlril  to  Pplon  rl  .1  / (,M 
thhni.oi  I •*  V r n 1 1 . mi  I idtr  1 (fi)  It  i«  those  findings  that  0111 
• n.alrl  0 intruded  torsplain  I lie  phenomena  ol  mlrird  hair 
Ufiijeili  ( h»r  0 htnlngn  a I and  *ir|«rnds  ti|nni  piojau  *.*rs  ol 
himg  nils  I hr  olhrr  as|*n  I is  phssnal,  dr  printing  ti|*Mt 
1‘M.pritn  * d radiation  ol  onr  kind  01  anotlsri  r g high  linrai 

1 • * igs  I*  ansh  i \\  f II  and  loss  I I I \hri  ilhotiatmg  thrsr  loo 
i*ia«t%  taken  srpaialrls  and  u»  lomhination  or  idlri  mil 

SU...  ..4iH.n»  II  I ln«a»  eivtgs  tiamln  (1|l  r .JmIuImi  g.  1 .Im* 

I * 1 I »•  »n 


nnslrt  1 mn^sat rd  to  it*  ant  rstoi  s o hit  h 1 amr  lo  mu  jlli  ntiou 
ihr  pio|mvii  nnalrl  lakes  into  at  s ount  thr  sinking  ddlrn  u. « 
l*el  o rr  n o hat  mas  l*r  1 allnl  thr  Imtr  *1  airs  appiopi  tale  to  hi 
ol»»gii  al  and  lo  phs  sit  al  as|tr\  t%  ol  ihr  phenoim  iu 

I mpiriral  findings 

I hr  r m pi t it  al  I Hidings  o hn  h slinudalni  thr  pirs*  lit  |wi|«ri  «u 
dhoti  alnl  in  thr  lidloo  mg  toil  ihagtams  thr  In  si  mu  tig  I 
dlustiatrs  thr  s*»  eallnl  dosr  latrrffrvl  of  gamma  t adiatuui 
(loo  I I 1 ) im  (hr  induction  of  a |tai  In  ulai  Irukeima  in  inn  e 
(M  1 he  1*  u ut  is  that  thr  samr  total  amount  ol  11 1 ad  tain  mi  i an 
l«r  adminolrirsi  uisifoiuds  1 ilhri  o*n  • long  01  o*ri  4 irlatis*  l» 
shoit  prnoif  of  Innr  In  tin  lonitri  tasr  ur  s|«.k  loo  d is. 

1 air  and  in  thr  srtmid  ol  Ingh  doar  ialr  I hr  giaphs  in  I ig 
I unin  atr  that  o ith  a high  do*r  1 air  (I fir  * • p| **  1 1 in  1 « ' . 
suhstanl tails  higliri  (srit  rnlagr  of  niadialnl  uiur  ais|im« 
Irukrnna  than  o ilh  (Iu*  loo  dow*  1 air  Siiiulai  irsults  *1  < ir  found 
foi  nlhn  1 am  rt  s 

I ig  I dholiatrs  also  anolhrt  phrimnir non  I fio  o lh.it 
|«rihaps  unrs|sr*  Irslls  ihr  rlfrvtis rnrss  ol  gamma  ladialion 
• ilmmidnnl  at  a high  d,**r  ialr  in  iiuhu  utg  Irukrnna  is  not  ■ 
monotone  I million  of  ihr  dior  llr  ohsri  s rd  tinpieiu  s ot  ihr 
(iMitiiulai  Irukrnna  lirgno  hs  tin  1 rising  **  it h tin*  do**-  irathes 
a masnnum  and  thru  dn  i rases 

I ig  • o irpmhnrsl  fiom  an  aitn  Ir  hs  li»«tri  ^t*>  lti*uii|voi  * 
ihr  Idr  shoitrning  rllr»  Is  ol  gamma  lass  and  ol  nrutions  Kith 
adiiuinslrinl  al  sanous  dosr  tales  and  al  sanous  doses  I Iu 
srsn al  nil srs  1 rial tng  to  gain ma  1 as  s r shdul  st long  dosr  1 ■)< 
effects  ^Isis,  some  id  them  suggest  thr  pirwin#*  ol  thi  masi 
mum  id  dosr  rllr%  lisrnrss  1 trails  shoo  n in  I ig  I I tuning  to 
nrutions  or  srr  ihr  same  iitilu  alion  id  a inasiinmti  ol  do*, 
rllr*  I hut  i iinousls  no  nolurahlr  d*»sr  ialr  r!lr%  I to  ihr  It  It 
ol  ihr  |*mit  of  masnnum 

Ihr  life  shoitrning  imln  atrsl  in  h ig  * seems  to  has<  l*ren 
meastiirsl  hs  n»m|sanng  I Isa*  nmhatt  Idr  sjmii  ol  uiadiami  min 
ss  ith  that  of  tfir  ivntuds 

I lie  ptis  sical  piojsri  In  * ol  thr  s a itous  knnt*  of  i adiatnos  the 
pio|srilirs  that  air  |saili*  ulails  tries  ant  to  out  001k  air  dim 
halesl  m tigs  A amt  4 fig  A irpirsrnts  a pKsti*giaph  of  a * lousl 
1 handset  rs|^ise*t  lo  a irtlain  kind  i»f  inadiation  \\  r air 
g«  atrf  ut  1 1*  Slninitri  Ktsndon  i*t  I Isa*  l\»nn»i  I aKnalois, 
l insrisits  ot  ( aldotma  al  Ihikrlrs  foi  Irlting  us  u*e  this 
ptiologiaph  I hr  pai In  ul.11  Is  trlrsanl  detail  ol  I ig  \ is  tin 
piesrnii  of  s 1 iso  nosing  Inn  s a fro  lathri  luoad  .md  m.m* 
sns  Iriiis  I hr  lines  in  ilk  lh«  lia.  ks  ol  h ilam  p.iUi  l«  * I h«  \ 
.01  1 oiii|«oesl  ol  nnnult  ilioph-t*  IuimiisI  aK>ot  ».ui*  gr»*ei.l.*l 
hs  ihrputiilrs  W heir  the  * .slide  line  Is  luoad  I In  passag.  ol 
(hr  jsailn  Ir  is  at « suii|sanir*t  t*s  tin  ap|s  ai  an*  r ol  mans  ion* 

0 tin  h It  a*  rl  lo  tvitstdr.  ahlr  d.staiH'rs  im  as  liom  tfsr  |*a.tn  I.  * 
lt*»k  (^ihiiuise  thrir  a.r  mils  irlatisrls  ti  ••  ions 

\n  mi|as«tanl  detail  to  U addesl  to  tlse  tat  t»  dlusti.le*!  m I 1. 

1 is  that  thr  pa  .In  Irs  in  ip  test  ion  t.  a*  el  at  emu  n...i»*  sjaasls  s.* 
that  lh«  * 1 m*m  a irll  n *9 hiss  a nnniite  I*  •.  t !•  •••  ••!«%«.  .md 


I I OUI  l* 


I 


-12- 


6 CONTROi  S 
V 7 rtx)s/n>o 


Figure  4.  Incidence  of  myeloid  leukemia  in  relation  to  dose  and 
dose  rate  of  gamma  radiation.  One  rad  = 0.01  d/kg. 


Figure  5.  Life  shortening  in  female  mice  as  influenced  bv 

dose  rate  of  gamma  rays  and  neutrons.  Open  symbols 
represent  gamma  rays;  filled  symbols,  neutrons. 


-14- 


6.  Theory  of  Epidemics  (Outdated).  Our  Stat.  Lab's  effort 
at  a theorv  of  epidemics  was  published  in  1964  in  a paper  co- 
authored by  F.L.  Scott  and  mvself.  The  reference  is  "A  Stochastic 
Model  of  Epidemics"  (Stochastic  Models  in  Medicine  and  Biology,  J. 
Gurland,  ed.  , The  University  of  Wisconsin  Press,  1964).  The  paper 
was  inspired  by  the  book  by  Norman  T.J.  Bailv  The  Mathematical 
Theory  of  Epidemics  that  summarized  ouite  a few  earlier  investi- 
gations, beainninc]  with  that  of  McKendric  of  19?6.  One  of  the 
basic  assumptions  of  many  of  these  works  was  that,  qiven  a oopu- 
lation  including  so  many  susceptibles  the  appearance  of  a sinole 
infectious  creates  a probability  of  contracting  the  disease  that 
is  the  same  for  each  of  the  susceptibles.  As  Bailey  points  out, 
any  assumption  of  this  kind  may  be  realistic  for  a dormitory  of 
a boarding  school  but  not  for  a citv  and  certainly  not  for  a 
country.  The  stochastic  model  we  produced  is  explicit  in  recog- 
nizing the  lack  of  uniformity  of  the  habitat.  For  a time,  the 
model  appeared  reasonably  realistic.  However,  during  the  winter 
quarter  of  1978,  there  came  the  awareness  of  an  important  lack 
of  realism.  The  ideas  underlying  an  effort  at  modernization 
are  described  in  the  next  section.  The  basic  assumptions  of  the 
"outdated"  theory  are  as  follows. 

(i)  Number  Infected  hy  a Sinole  Infectious.  Consider  an 
infinite  plane  H described  as  the  habitat.  It  is  assumed  that 
to  each  point  in  H with  coordinates  u (Up  u.,)  there  corresponds 


I 


-15- 


a random  variable  v(u)  representing  the  number  of  susceptibles 
who  would  be  infected  if  at  that  point  there  was  a sinqle  in- 
fectious. While  the  actual  distribution  of  v(u)  is  left  unspecified, 
it  is  assumed  that  the  variables  v(u)  correspondinq  to  different 
points  in  the  habitat  are  mutually  independent.  In  fart,  it  is 
assumed  that  the  variables  v(u)  are  independent  of  all  other 
random  variables  of  the  system  and  that  plv(u)r0l<l. 

(ii)  Dispersal  of  Infected.  It  v^as  assumed  that  durinq  the 
"latent  period"  T (the  same  for  all  infected)  the  individuals  in- 
fected at  u travel  independently  from  each  other.  Furthermore, 
it  was  assumed  that  to  each  point  u in  the  habitat  there  corres- 
ponds a function  f(x|u)  representing  the  probability  density  of 
the  location  x=(x-|,  x?)  where  an  individual  infected 

at  u becomes  infectious.  Except  for  certain  conditions  of  regular- 
ity, the  function  f(x  |u) , the  "dispersal  function"  is  left  un- 
speci fied. 

(iii)  Immigration.  The  term  "immigrants"  is  used  to  describe 
real  infectious  immigrants  and  also  local  inhabitants  who  become 
infectious  "spontaneously"  perhaps  due  to  mutations  of  bacteria 

in  their  bodies.  It  is  postulated  that  the  appearance  of  an 
infectious  "immigrant"  is  governed  by  a density  function  \ (u) 
defined  over  the  whole  habitat  and  subject  to  certain  conditions 
of  regularity. 


(iv)  Discrete  Generations.  It  was  assumed  that  the  duration 


-16- 


of  infectiousness  is  zero  and  that  occurrences  of  infection, 
all  over  the  habitat  occur  simultaneously.  In  consequence,  the 
development  of  an  epidemic  is  divided  into  discrete  generations. 

The  relevant  mathematics  involves  the  following  two  concepts. 

(i)  The  random  variable  v(u),  the  number  infected  bv  a single 
infectious  at  a specified  point  u=(u^,  u.O  in  the  habitat,  and 

(ii)  the  random  variable,  sav  y(X|u),  the  number  infected  somewhere  in 
the  habitat  (X  is  a random  variable)  by  a single  individual  of 

the  earlier  generation  of  the  epidemic  who  became  infected  at 
a specified  point  u in  the  habitat. 

The  specification  of  a particular  kind  of  epidemic,  sav  of 
polio,  depended  on  two  families  of  functions,  v(u)  (=  the  sizes 
of  "clusters"  centered  at  u)  and  the  dispersal  function  f(x|u), 
both  subject  to  certain  conditions  of  regularity. 

The  subjects  of  study  included  the  possibility  that  an 
’ epidemic  started  by  a single  infectious  might  get  "out  of  hand," 
as  was  once  the  case  of  a polio  epidemic.  Among  the  particular 
cases  considered  there  was  the  possibility  that  a region  R.  marked 
by  highlv  hygienic  conditions,  will  escape  the  outbreak  of  a 
substantial  epidemic,  while  in  the  rest  of  the  habitat  that  same 
epidemic  will  get  "out  of  hand."  Two  particular  theorems,  which 
in  private  conversations  were  called  "democracy  theorems"  Indicated 
that  efforts  at  the  establishment  of  such  especially  "healthv" 
regions  would  be  futile. 


7.  Theory  of  Epidemics  (Modernized).  During  the  winter 
quarter  of  1978,  discussion  of  the  theory  of  epidemics  .iust 
described  benefited  by  the  DarticiDation  of  Mrs.  Florence 
Morrison  of  the  California  State  Department  of  Health.  Also, 
we  had  several  other  rather  interested  and  active  members  of 
the  group  of  whom  I shall  mention  two  visitors  from  abroad. 

Dr.  S.  Kwesi  Ddoom,  a Fulbright  Fellow  from  Uganda,  and  Dr. 

Luis  R.  Perrichi  from  Venezuela. 

Mrs.  Morrison's  most  valuable  contribution  was  the  remark 
that  the  real  habitats  represented  by  entire  countries  are 
much  more  heterogeneous  that  the  older  theory  presupposed. 

With  reference  to  an  epidemic  of  a communicable  disease,  Mrs. 
Morrison  contended  (and  everyone  agreed)  that  real  habitats, 
such  as  the  state  of  California,  are  stratified  according  to 
socio-economic  status  of  the  population.  This  stratification 
influences  the  development  of  an  epidemic.  At  the  very  least 
three  categories  of  locations  have  to  be  considered,  depending 
on  the  income  of  the  inhabitants:  "high,"  "middle,"  and  "low" 
(say  slums,  which  is  the  term  I used).  There  was  the  concensus 
of  opinion  that  the  number  infected  bv  a sinole  infectious 
depends  not  only  on  the  region,  say  R^ , in  which  the  infection 
takes  place,  but  also  on  the  reoion,  sav  R^,  where  the  infecting 
individual  lives.  For  example,  if  an  inhabitant  of  slums  suddenl 
becomes  infectious  in  the  locality  he  inhabits,  he  is  likely 
to  infect  many  more  people  around  him,  than  would  the  visiting 


-18- 


inhabitant  of  a high  income  region,  etc. 

The  discussion  that  followed  resulted  in  a somewhat  unusual 
"take  home  exam"  I formulated  last  year,  see  next  pane. 

The  result  of  the  exam  oroved  quite  intent's  fine  to  several 
participants  in  the  discussion,  including  mvself.  While  the  sub- 
division of  California  into  only  three  different  socio-economic 
regions  represented  an  obvious  over-simplification,  the  results 
obtained  appear  instructive  and  there  are  plans  afoot  to  produce 
a paper  for  publication. 

8-  Concluding  Remarks.  As  mentioned  at  the  outset,  in 
selecting  the  subject  of  this  Colloquium  presentation,  1 had  in 
mind  to  illustrate  the  phenomenon  of  evolution  of  an  idea.  The 
idea  of  the  mechanism  of  "clustering"  does  not  represent  any- 
thing unique.  Many  other  fruitful  ideas  also  evolve.  Otherwise, 
thev  would  hardly  be  considered  "fruitful."  Ordinarily,  the  pro- 
cess of  substantial  evolution  of  a simple  idea  takes  quite  some 
time,  much  longer  than  the  time  we  ordinarily  spend  in  learning 
our  contemporary  state  of  that  evolving  idea.  Thus,  the  phe- 
nomenon of  the  evolution  escapes  our  attention.  Yet.  it  seems 
interesting. 


Statistics  2S8 
Winter  1978 


-19- 


Mr.  Neyrnan 


TAKE  HOME  FINAL  EXAM 

Due  for  Delivery  and  discussion  on  Thursday,  March  23,  12:30-3:30  pm, 
in  Room  72  Evans. 

Your  presence  at  the  discussion  IS  NECESSARY. 


Instructions:  Write  clearly  and  tidily.  Use  ink  or  an  intense 

black  pcnc i 1 . 

* * * * * * * 


Problem  1.  State  the  basic  assumptions  of  the  theory  presented  in 
lectures  and  describe  the  principal  results  (e.g.  What 
are  the  "democracy"  theorems?).  Use  your  own  words. 

Do  not  copy  from  the  published  paper. 


Problem  2^.  Criticize  the  basic  assumptions  of  the  theory  even  with 

reference  to  communicable  diseases  spread  through  personal 
contacts  between  infectious  and  suscept ib les , like  cough- 
ing. How  should  the  basic  assumptions  be  modified  to 
make  the  theory  more  realistic? 


P rob  1 cm  3.  Use  computer  facilities  to  simulate  the  development  of  an 
epidemic  in  conditions  slightly  more  Tealistic  than  de- 
scribed in  the  early  part  of  the  course. 

Consider  a habitat  composed  of  three  regions  R. , and 
R3: 

R.  : high  income  region*  with  sparce  population  and 
facilities  for  travel. 

R?  : middle  income  region  (typified  by  Berkeley),  with 

substantially  denser  population  and  with  reasonable 
facilities  for  travel. 

R, : low  income  region,  with  very  dense  population  and 

very  limited  travel  opportunities. 

Assume  that  each  of  the  three  regions  is  uniform  in  all 
respects  and  assign  to  them  numerical  values  of  the  various 
relevant  parameters  (e.g.  of  probabilities  that  an  indi- 
vidual inhabiting  R.  will  become  infectious  in  R. , etc.). 
The  values  assigned1should  be  consistent  with  yoir  intui- 
tion, but  different  from  those  of  your  colleagues  (consult 
with  them!).  Next,  consider  an  epidemic  initiated  by  a 
single  individual  who  became  infectious  in  one  of  the 
three  regions.  Then,  use  the  Monte  Carlo  simulation  tech- 
nique to  generate  100  epidemics  and  calculate  the  mean 
number  of  cases  in  the  successive  generations  of  the  epi- 
demic and  the  mean  total  size  of  the  epidemic.  What  about 
the  "democratic"  theorems? 


Good  Luck! 


ACKNOWLEDGEMENTS 


This  paper  was  prepared  using  the  facilities  of  the  Statis- 
tical Laboratory  with  partial  support  from  the  Office  of  Naval 
Research  (ONR  N00014  75  C 0159),  the  Department  of  the  Army  (Grant 
DA  AG  29  76  G 0167),  and  the  National  Institute  ot  Environmental 
Health  Sciences  (2  R01  ES01299-16).  This  support  is  gratefully 
acknowledged.  The  opinions  expressed  are  those  of  the  author. 


